1. Introduction

Advances in experimental and theoretical work increasingly suggest that parasite interactions within a single host can affect the spread and severity of wildlife diseases. Yet empirical data to support predicted co-infection patterns are limited due to the practical challenges of gathering convincing data from animal populations and the stochastic nature of parasite transmission.

Here, we investigated co-infection patterns between micro- (bacteria and protozoa) and macroparasites (gastrointestinal helminths) in natural populations of the multimammate mouse (Mastomys natalensis).

We hypothesize that:

Mastomys natalensis

2. Set up R & the data

First, we need to setup R. HMSC models generally generate a lot of output (models, panels, etc). It is therefore easier to put this in different maps in order to avoid chaos.

So, let’s create 3 different folders to store models, data and panels

localDir = "."
ModelDir = file.path(localDir, "models")
DataDir = file.path(localDir, "data")

Now we can read in the data.

SXY = read.csv2("BartPaper.csv", stringsAsFactors=TRUE,sep=",") 

The file that we’ll load is the full dataset.
It consists generally out of three parts:

Let’s first look at the data

2.1. Data inspection

S: study design

##   Individual Trap_Site
## 1    TZ34866      co-3
## 2    TZ34867      co-3
## 3    TZ34868      co-3
## 4    TZ34869      co-3

The study design consists out of:

  • Individual = the individuals’ unique code (N= 211)
  • Trap_site = the location where the individual is caught

We Trapped the rodents in Morogoro, Tanzania on 11 different sites.

The mean distance between every plot is 992m (min = 76m, max = 2073m).

The average dispersion distance of M. natalensis is 300m, while the average home range sizes are relatively small (±650m² using capture-mark recapture data and ±1200m² using radio tracking). Most plots are spaced more than 300m from each other and are therefore independent of each other, except for plots CO1, CO3 and CO10 which are indeed very close to each other (marked in red on the table). In these plots, we are probably sampling the same population. We therefore decided to consider these three locations as one.

The number of trapped individuals per site are:

## 
## co-11  co-2  co-3  co-4  co-5  co-6  co-7  co-8  co-9 
##     2     3    88    13    26    24     3    41    11

X: covariates to be used as predictors

We will use the following variables as fixed effects in the model:

  • Sex (Male or female)
  • Exploration behaviour (continious)
  • Stress-sensitivity (continious)
  • Eye lens weight or reproductive success

Let’s look at all of them separeltly.

  1. Sex
## 
##   F   M 
## 136  75

Fewer males than females. Still the dataset looks sufficient.

  1. Exploration & Stress sensitivity:

Looks good.

  1. Eye Lens weight or reporoductive success.

An important driver of parasitic infection probability is the individuals’ exposure time to the parasite. The longer you are exposed, the more likely you are to eventually become infected.

This can be estimated either using the individuals reproductive age (adult or juveniles; adults are assumed to have a longer exposure time than juveniles) or by using Eye Lens weight which is an unbiased proxy for age in small mammals, and thus consequently, exposure time.

We’ll first look at eye lens weight

We should note that were not able to measure the eye lens weight of two individuals. We therefore decided to give them the mean value, and therefore will not influence the results.

SXY$EyeLens <- as.numeric(as.character(SXY$EyeLens))
A<-SXY$EyeLens[1:30]
B<-SXY$EyeLens[33:211]
Q<-c(A,B)
SXY$EyeLens[31]<-mean(Q)
SXY$EyeLens[32]<-mean(Q)

From this graph, it is clear that there are two big age categories. A large proportion have an eye lens weight of less than 20mg, while there is a group with an eye lens weight of more than 25mg.

Let’s transfer this to the actual age, based on the the following formula:
\[ w = -10.46088 + (4.35076 * ln(a)) \]

with:

  • w = dry eye lens weight in mg
  • a = age in days

If we transform that, it will be:

\[ a = e^{ \frac{10.46088 + w }{4.35076} } \]

So let’s calculate this for each ID and plot it

SXY$AgeDays <-  exp((10.46088+(SXY$EyeLens/2))/4.35076)

From here, it is clear that a large group is less than 100 days old, while others are almost one year old.
This means there are two different age groups: those born in the year when we conducted our study, and others which were borne during the previous breeding seasons, the year before.

THere are now multiple ways to include this into the model:

  • Use Eye lens weight as a continious predictor
  • Group the individuals in separate classes

The first option will lead to a large bias, since the the small proportion of animals that were born in the previous breeding season will have an extreme large effect on the correlation.

Grouping them in different categories is therefore more sensible. We will divide them in the following groups:

  • < 50 days old. These individuals are less than 2 months old and are not reproductive active
  • 50 > x < 200. These are all adults, but born on within the year when we conducted the fieldwork
  • olde than 200 days . These are all individuals that were born during the previous breeding season.

Let’s make these groups in the dataset

SXY$Age2 <-c(1:nrow(SXY))   # MAke a new column first

for (i in 1:nrow(SXY)){
  if (SXY$AgeDays[i]<50){
    SXY$Age2[i] <- 1
  } else {
    SXY$Age2[i] <- 2
  }
}

for (i in 1:nrow(SXY)){
  if (SXY$AgeDays[i]>200){
    SXY$Age2[i] <- 3
  } 
}

SXY$Age2<-as.factor(SXY$Age2)
table(SXY$Age2)
## 
##   1   2   3 
## 100  88  23

Now let’s take a look at reproductive age.

We see that juveniles are indeed a lot younger than adults (except three individuals, which we will investigate later on). This also confirms that the assumptions in Vanden Broecke et al. (2021) were correct.
Additionally, the two individuals without eye lens weight were also considered to be adults (the two red points), which is also reassuring that they will have little effect on the dataset.

Now, it is clear that we cannot run a model with both Reproductive age and age in days as fixed effects. We will therefore have two seprate models where we either have reproductive age as a fixed effect, or age (in days, based on the three categories we defined previously).

Oke, done.

Y: Species data

We have several parasites that we will include in our model:

  • Hymenolepis
  • Davaineida
  • Protospirura
  • Trichostrongylidae
  • Trichuris
  • Syphacia
  • Bartonella
  • Anaplasma

First, we’ll look at the helminths

A quick view, we see that that there is some correlation between the different infections.
For instance, individuals infected with Protospirura are also more likely to be infected with Dava and Trichu.

The number of infected individuals and the corresponding prevalence is shown in the table below

Names InfectedID Prevalence
Hymenolepis 96 0.4549763
Davaineida 59 0.2796209
Protospirura 74 0.3507109
Trichostrongylidae 30 0.1421801
Trichuris 30 0.1421801
Syphacia 45 0.2132701

For Bartonella and Anaplasma, things are a bit more complicated.
Some samples were uncertain (i.e., individuals that were repeatedly positive on the qPCR but negative for Sanger sequencing on the conventional PCR), we will therefore run two different models where we considered all uncertain samples either as negative or positive (further referred to as, respectively, the uncertain-negative and uncertain-positive model).

First, we will need to account for this. We’ll add a column where we either consider the uncertains as positive or negative, both for Bartonella and for Anaplasma

# Bartonella

#### Uncertain --> POSITIVE
SXY$bartPOS <- as.numeric(SXY$bart)

for (i in 1:nrow(SXY)){
  if(SXY$bart[i] == "neg"){SXY$bartPOS[i] <- 0} 
  if(SXY$bart[i] == "pos"){SXY$bartPOS[i] <- 1} 
  if(SXY$bart[i] == "uncertain"){SXY$bartPOS[i] <- 1}} # Uncertains are now positive

#### Uncertain --> Negative
SXY$bartNEG <- as.numeric(SXY$bart)

for (i in 1:nrow(SXY)){
  if(SXY$bart[i] == "neg"){SXY$bartNEG[i] <- 0} 
  if(SXY$bart[i] == "pos"){SXY$bartNEG[i] <- 1} 
  if(SXY$bart[i] == "uncertain"){SXY$bartNEG[i] <- 0}} # Uncertains are now Negative



# Anaplasma

#### Uncertain --> POSITIVE
SXY$anaPOS <- as.numeric(SXY$ana)

for (i in 1:nrow(SXY)){
  if(SXY$ana[i] == "neg"){SXY$anaPOS[i] <- 0} 
  if(SXY$ana[i] == "pos"){SXY$anaPOS[i] <- 1} 
  if(SXY$ana[i] == "uncertain"){SXY$anaPOS[i] <- 1}} # Uncertains are now positive

#### Uncertain --> Negative
SXY$anaNEG <- as.numeric(SXY$ana)

for (i in 1:nrow(SXY)){
  if(SXY$ana[i] == "neg"){SXY$anaNEG[i] <- 0} 
  if(SXY$ana[i] == "pos"){SXY$anaNEG[i] <- 1} 
  if(SXY$ana[i] == "uncertain"){SXY$anaNEG[i] <- 0}} # Uncertains are now positive
Names Infected_ID_POSITIVE Infected_ID_NEGATIVE Prevalence_POSITIVE Prevalence_NEGATIVE
Bartonella 22 15 0.1042654 0.0710900
Anaplasma 40 31 0.1895735 0.1469194

2.2. Split the data & include a trait datafile

In order to continue with the HMSC models, we need to split the S, X and Y data from each other and store it as a new R file.

For the positive models this will look like this

# S: study design
S=SXY[,1:2] 
S[1:4,]
##   Individual Trap_Site
## 1    TZ34866      co-3
## 2    TZ34867      co-3
## 3    TZ34868      co-3
## 4    TZ34869      co-3
# X: covariates to be used as predictors
X=SXY[,c(3,4,10,11,26)] 
X[1:4,]
##   Sex Age Exploration Stress_Sens Age2
## 1   F   J  -1.5465289  1.11937868    1
## 2   M   A   0.9608683  1.30348080    3
## 3   F   A   0.2064169 -0.28306460    3
## 4   M   J   0.4458350  0.03908914    1
# Y: species data
Y=SXY[,c(12:17,33,35)] 
Y[1:4,]
##   Hymenolepis Davaineidae Protospirura Trichostrongylidae Trichuris Syphacia
## 1           0           0            0                  2         0        0
## 2           1           3            2                 72         6        0
## 3           0           2           41                  0         2        0
## 4           0           0            0                  1         0        6
##   bartPOS anaPOS
## 1       0      0
## 2       0      0
## 3       0      0
## 4       0      0
# In order not to change the trait file, I will change the bartonella and anaplasma name 


names(Y)<- c("Hymenolepis","Davaineidae","Protospirura","Trichostrongylidae","Trichuris","Syphacia" ,"bart2","ana2" )

Now, we will include the trait file. This basically groups the Y depending on the transmission mode (direct, indirect or vector)

# Will you provide and file with traits (T) or a Phylogeny?
is.TP = TRUE    # traits will be included
is.P = FALSE    # Phylogeny will NOT be included



# READING IN TP: traits (T) and/or phylogenetic information in table format (P)
if(is.TP){
  # Read in the species names as rownames, not as a column of the matrix
  TP = read.csv("TP3.csv", stringsAsFactors=TRUE,header=T,row.names = 1)
  # The script below checks if the species names in TP are identical and in the same order as in Y
  # If the script prints "species names in TP and SXY match", you are ok.
  # If it says that they do not match, you need to modify the files so that they match 
  if(all(rownames(TP)==colnames(Y))) {
    print("species names in TP and SXY match")
  } else{
    print("species names in TP and SXY do not match")
  }
  # Modify the next two lines to split your TP file to components that relate to
  # Tr: species traits (note that T is a reserved word in R and that's why we use Tr)
  # P: phylogenetic information given by taxonomical levels, e.g. order, family, genus, species
  # If you don't have trait data, indicate this by Tr=NULL. 
  # If TP does not have phylogenetic data (because you don't have such data at all, or because
  # it is given in tree-format, like is the case in this example), indicate this with P=NULL 
  Tr = TP[,1:1]
  P = NULL
  # Check that the data looks as it should!
  View(Tr)
 
  # Check that the Tr data do not have missing values (they are allowed for Y but not S, X, P or Tr)
  if (any(is.na(Tr))) {
    print("Tr has NA values - not allowed for")
  } else {
    print("Tr looks ok")    }
  # Check that the phylogenetic/taxonomic data do not have missing values (they are allowed for Y but not S, X, P or Tr)
  if (any(is.na(P))) {
    print("P has NA values - not allowed for")
  } else {
    print("P looks ok") }
}
## [1] "species names in TP and SXY match"
## [1] "Tr looks ok"
## [1] "P looks ok"
Tr = data.frame(TP)
names(Tr)= c("Trans_Mode","Path")
Tr
##                    Trans_Mode  Path
## Hymenolepis          Indirect  Tape
## Davaineidae          Indirect  Tape
## Protospirura         Indirect  Nema
## Trichostrongylidae     Direct  Nema
## Trichuris              Direct  Nema
## Syphacia               Direct  Nema
## bart2                  Vector ZBact
## ana2                   Vector ZBact

Now let’s save the data!

#  Save the data!
Y = as.matrix(Y)

save(S,X,Y,Tr,file="allDataPOS.R")

We need to do the same for the negative model obviously.

# Y: species data
Y=SXY[,c(12:17,34,36)] 
# In order not to change the trait file, I will change the bartonella and anaplasma name 
names(Y)<- c("Hymenolepis","Davaineidae","Protospirura","Trichostrongylidae","Trichuris","Syphacia" ,"bart2","ana2" )

# Will you provide and file with traits (T) or a Phylogeny?
is.TP = TRUE    # traits will be included
is.P = FALSE    # Phylogeny will NOT be included

# READING IN TP: traits (T) and/or phylogenetic information in table format (P)
if(is.TP){
  # Read in the species names as rownames, not as a column of the matrix
  TP = read.csv("TP3.csv", stringsAsFactors=TRUE,header=T,row.names = 1)
  # The script below checks if the species names in TP are identical and in the same order as in Y
  # If the script prints "species names in TP and SXY match", you are ok.
  # If it says that they do not match, you need to modify the files so that they match 
  if(all(rownames(TP)==colnames(Y))) {
    print("species names in TP and SXY match")
  } else{
    print("species names in TP and SXY do not match")
  }
  # Modify the next two lines to split your TP file to components that relate to
  # Tr: species traits (note that T is a reserved word in R and that's why we use Tr)
  # P: phylogenetic information given by taxonomical levels, e.g. order, family, genus, species
  # If you don't have trait data, indicate this by Tr=NULL. 
  # If TP does not have phylogenetic data (because you don't have such data at all, or because
  # it is given in tree-format, like is the case in this example), indicate this with P=NULL 
  Tr = TP[,1:1]
  P = NULL
  # Check that the data looks as it should!
  View(Tr)
 
  # Check that the Tr data do not have missing values (they are allowed for Y but not S, X, P or Tr)
  if (any(is.na(Tr))) {
    print("Tr has NA values - not allowed for")
  } else {
    print("Tr looks ok")    }
  # Check that the phylogenetic/taxonomic data do not have missing values (they are allowed for Y but not S, X, P or Tr)
  if (any(is.na(P))) {
    print("P has NA values - not allowed for")
  } else {
    print("P looks ok") }
}
Tr = data.frame(TP)
names(Tr)= c("Trans_Mode","Path")
Tr

#  Save the data!
Y = as.matrix(Y)

save(S,X,Y,Tr,file="allDataNEG.R")

We are ready to construct the models!

3. Construct the models

Constructing the models is done via the following steps:

  1. Load the data
  2. Set X-covariates part of the model
  3. Set the traits part of the model
  4. Create the random part of the model
  5. Set the Y-data
  6. Create the full model and save it

3.1. Positive model with age based on eye lens

  1. Load the data
load(file=file.path("allDataPOS.R")) 
  1. Set X-covariates part of the model
# Make sure that exploration and stress sensitivity are numeric
X[,3:4] = apply(X[,3:4], 2, as.numeric)

# Create the fixed effects part of the model
XFormula = ~Sex + Age2+ Exploration + Stress_Sens 
  1. Set the traits part of the model
# We will look at the transmission mode
TrFormula = ~Trans_Mode 
  1. Create the random part of the model
# Create the random effects part of the model
studyDesign = data.frame(individual = as.factor(S$Individual), site = as.factor(S$Trap_Site)) # Create the study design

St = studyDesign$site # Set site as a random effect
rL.site = HmscRandomLevel(units = levels(St))

Ind = studyDesign$individual # Set the individual as an additional fixed effect
rL.individual = HmscRandomLevel(units = levels(Ind))
  1. Set the Y-data
Ypa = 1*(Y>0) # Transfer the data to presence/absence
# We are only interested in infection probability, rather than parasite load
  1. Create the full model and save it
# Presence-absence model
m1 = Hmsc(Y=Ypa,                     # y-variable (1/0)
          XData = X,                 # X-variables 
          XFormula = XFormula,       # Fixed effects part
          TrData = Tr,               # Trait data
          TrFormula = TrFormula,     # Trait effects part
          distr="probit"   ,         # Distribution
          studyDesign=studyDesign,   # Study design
          ranLevels={list("site" = rL.site,  # random effects
                          "individual" = rL.individual)})


# Place the model in a list 
models = list(m1)
modelnames = c("presence_absence") # Give them the proper names



# Save the model in the model folder
save(models,modelnames,file = file.path(ModelDir, "unfitted_model_POS_EyeLens_LAST"))

We will also do the same for the positive model with reproductive age as a fixed effect. But I will spare you the code

3.2. Negative model

Again, we’ll have two models, either with age based on the eye lens weight, and one with reproductive age

  1. Load the data
load(file=file.path("allDataNEG.R")) 
  1. Set X-covariates part of the model
# Make sure that exploration and stress sensitivity are numeric
X[,3:4] = apply(X[,3:4], 2, as.numeric)

# Create the fixed effects part of the model
XFormula = ~Sex + Age2+ Exploration + Stress_Sens 
  1. Set the traits part of the model
# We will look at the transmission mode
TrFormula = ~Trans_Mode 
  1. Create the random part of the model
# Create the random effects part of the model
studyDesign = data.frame(individual = as.factor(S$Individual), site = as.factor(S$Trap_Site)) # Create the study design

St = studyDesign$site # Set site as a random effect
rL.site = HmscRandomLevel(units = levels(St))

Ind = studyDesign$individual # Set the individual as an additional fixed effect
rL.individual = HmscRandomLevel(units = levels(Ind))
  1. Set the Y-data
Ypa = 1*(Y>0) # Transfer the data to presence/absence
# We are only interested in infection probability, rather than parasite load
  1. Create the full model and save it
# Presence-absence model
m1 = Hmsc(Y=Ypa,                     # y-variable (1/0)
          XData = X,                 # X-variables 
          XFormula = XFormula,       # Fixed effects part
          TrData = Tr,               # Trait data
          TrFormula = TrFormula,     # Trait effects part
          distr="probit"   ,         # Distribution
          studyDesign=studyDesign,   # Study design
          ranLevels={list("site" = rL.site,  # random effects
                          "individual" = rL.individual)})


# Place the model in a list 
models = list(m1)
modelnames = c("presence_absence") # Give them the proper names



# Save the model in the model folder
save(models,modelnames,file = file.path(ModelDir, "unfitted_model_NEG_EyeLens_LAST"))

Now we do the same, but using reproductive age instead of age based on eye lens weight (code is again hidden to make this more readable)

4. Run the models & check model convergence

Now that we have constructed the two models (positive and negative), we are able to run them.

In order to determine how when the model converges, we need to run the models multiple time and see how big the thinning needs to be.
Luckely, we can do loop this.

4.1. Positive model

4.1.1. Run the model

The following code runs the model. More precisely, it will run several different models with different thinnings. THis is needed to determine the minimal thinning needed for the model to converge.

But first, we need to run the model where we use age based on eye lens weight

# First, we need to load the model
load(file = "models/unfitted_model_POS_EyeLens_LAST")


# Setup the model runnings parameters
samples_list = c(5,250,250,250,250)        # We will take only 250 samples first, to make it fast
thin_list = c(1,1,10,100,500)              # We will increase the thinning with subsequent models
nChains = 5                            # number of chains

# Run the models 

for(Lst in 1:length(samples_list)){        # Creates the loop to run the different models
  thin = thin_list[Lst]                    # takes the Thinning number out the list
  samples = samples_list[Lst]              # Takes the Number of samples number out the list
  print(paste0("thin = ",as.character(thin),"; samples = ",as.character(samples)))   # This will print which model is running
  nm = length(models)  # this is not really important here, because we are running one model (compared to a hurdle model)
  for (model in 1:nm) { 
    print(paste0("model = ",modelnames[model]))   # ads the model name 
    m = models[[model]]                           # this takes the model
    
    # this is the actual running part
    m = sampleMcmc(m,                                            # Model info                               
                   samples = samples,                            # Number of samples
                   thin=thin,                                    # thinning interval
                   adaptNf=rep(ceiling(0.4*samples*thin),m$nr),  # Not sure
                   transient = ceiling(0.5*samples*thin),        # burning? or the one above
                   nParallel = 5,                                # paralleling the chain over the different cores
                   nChains = nChains)                            # Number of chains
    
    models[[model]] = m  # voila!
  }
  
  # Now, it will save all the different models in the model folder
  filename = paste("models/POSITIVE_EYELENS_MARKDOWN_models_thin_", as.character(thin),
                   "_samples_", as.character(samples),
                   "_chains_",as.character(nChains),
                   ".Rdata",sep = "")
  save(models,modelnames,file=filename)
}

Now we will run the model with reproductive age as fixed effect (code again hidden)

4.1.2. Check model convergence

This code will generate a PDF which we can use to determine of the model converged or not.

This is the model with age based on eye lens weight

# Use the same numbers as before
samples_list = c(5,250,250,250,250)        # We will take only 250 samples first, to make it fast
thin_list = c(1,1,10,100,500)              # We will increase the thinning with subsequent models
nChains = 5   
nst = length(thin_list) 


ma = NULL
na = NULL

# Run the loop
for (Lst in 1:nst) {
  thin = thin_list[Lst]
  samples = samples_list[Lst]
  
  
  filename = paste("models/POSITIVE_EYELENS_MARKDOWN_models_thin_", as.character(thin),
                   "_samples_", as.character(samples),
                   "_chains_",as.character(nChains),".Rdata",sep = "")
  load(filename)
  nm = length(models)
  for(j in 1:nm){
    mpost = convertToCodaObject(models[[j]], spNamesNumbers = c(T,F), covNamesNumbers = c(T,F))
    psrf.beta = gelman.diag(mpost$Beta,multivariate=FALSE)$psrf
    tmp = summary(psrf.beta)
    if(is.null(ma)){
      ma=psrf.beta[,1]
      na = paste0(as.character(thin),",",as.character(samples))
    } else {
      ma = cbind(ma,psrf.beta[,1])
      if(j==1){
        na = c(na,paste0(as.character(thin),",",as.character(samples)))
      } else {
        na = c(na,"")
      }
    }
  }
}
dev.new()
pdf(file=paste("models/MCMC_convergence_POS_EYELENS.pdf"))
par(mfrow=c(2,1))
vioplot(ma,col=rainbow_hcl(nm),names=na,ylim=c(0,max(ma)),main="psrf(beta)")
vioplot(ma,col=rainbow_hcl(nm),names=na,ylim=c(0.9,1.1),main="psrf(beta)")
dev.off()


# EXTRA FOR LATER
# vioplot(ma)
# max(ma)
# mean(ma)

# there is now a PDF file stored in the models map

We will run the same code for the model with reproductive age

Let’s take a look at the results generated from the code above.

The values need to be around 1. The two figures on the left are based on the model with age in days as fixed effect. The figures on the right are derived from the model with reproductive age as fixed effect.

It is clear this the models with a thining of 1-10 did not converge. THe models with 100-500 are a lot better.
From this, I think that it’s safe to say that a thinning of 1.000 should be sufficient for the full model.

4.1.3. Run the full model

We are now ready to run the full models.
Here is the code to run the full model with age based on eye lens weight as a a fixed effect

# First, we need to load the model
load(file = "models/unfitted_model_POS_EyeLens_LAST")


# Setup the model runnings parameters
samples_list = c(2000)           # We will take 2000 posterior samples
thin_list = c(1000)              # thinning is 1000
nChains = 5                      # number of chains

# Run the models 

for(Lst in 1:length(samples_list)){        # Creates the loop to run the different models
  thin = thin_list[Lst]                    # takes the Thinning number out the list
  samples = samples_list[Lst]              # Takes the Number of samples number out the list
  print(paste0("thin = ",as.character(thin),"; samples = ",as.character(samples)))   # This will print which model is running
  nm = length(models)  # this is not really important here, because we are running one model (compared to a hurdle model)
  for (model in 1:nm) { 
    print(paste0("model = ",modelnames[model]))   # ads the model name 
    m = models[[model]]                           # this takes the model
    
    # this is the actual running part
    m = sampleMcmc(m,                                            # Model info                               
                   samples = samples,                            # Number of samples
                   thin=thin,                                    # thinning interval
                   adaptNf=rep(ceiling(0.4*samples*thin),m$nr),  # Not sure
                   transient = ceiling(0.5*samples*thin),        # burning? or the one above
                   nParallel = 5,                                # paralleling the chain over the different cores
                   nChains = nChains)                            # Number of chains
    
    models[[model]] = m  # voila!
  }
  
  # Now, it will save all the different models in the model folder
  filename = paste("models/POSITIVE_EYELENS_FINAL_MARKDOWN_models_thin_", as.character(thin),
                   "_samples_", as.character(samples),
                   "_chains_",as.character(nChains),
                   ".Rdata",sep = "")
  save(models,modelnames,file=filename)
}

Now we will do the same for the model with reproductive age

Let’s quickly check if the models converged.

## [1] 1.001895
## [1] 1.000292
## [1] 1.008809
## [1] 1.000745

Yes, both models have converged (values should be around 1, which is the case here).

4.2. Negative model

4.2.1. Run the model

We need to do the same for the negative model as well.

So first the model with age based on eye lens weight as a fixed.

# First, we need to load the model
load(file = "models/unfitted_model_NEG_EyeLens_LAST")


# Setup the model runnings parameters
samples_list = c(5,250,250,250,250)        # We will take only 250 samples first, to make it fast
thin_list = c(1,1,10,100,500)              # We will increase the thinning with subsequent models
nChains = 5                            # number of chains

# Run the models 

for(Lst in 1:length(samples_list)){        # Creates the loop to run the different models
  thin = thin_list[Lst]                    # takes the Thinning number out the list
  samples = samples_list[Lst]              # Takes the Number of samples number out the list
  print(paste0("thin = ",as.character(thin),"; samples = ",as.character(samples)))   # This will print which model is running
  nm = length(models)  # this is not really important here, because we are running one model (compared to a hurdle model)
  for (model in 1:nm) { 
    print(paste0("model = ",modelnames[model]))   # ads the model name 
    m = models[[model]]                           # this takes the model
    
    # this is the actual running part
    m = sampleMcmc(m,                                            # Model info                               
                   samples = samples,                            # Number of samples
                   thin=thin,                                    # thinning interval
                   adaptNf=rep(ceiling(0.4*samples*thin),m$nr),  # Not sure
                   transient = ceiling(0.5*samples*thin),        # burning? or the one above
                   nParallel = 5,                                # paralleling the chain over the different cores
                   nChains = nChains)                            # Number of chains
    
    models[[model]] = m  # voila!
  }
  
  # Now, it will save all the different models in the model folder
  filename = paste("models/NEGATIVE_EYELENS_MARKDOWN_models_thin_", as.character(thin),
                   "_samples_", as.character(samples),
                   "_chains_",as.character(nChains),
                   ".Rdata",sep = "")
  save(models,modelnames,file=filename)
}

Now we’ll do the same for the model with reproductive age as fixed effect. Again the code is hidden because it is very similar.

4.2.2. Check model convergence

This code will generate a PDF which we can use to determine of the model converged or not. THe codes are the same as above, so we won’t include it (just to save some place)

The values need to be around 1. The two figures on the left are based on the model with age in days as fixed effect. The figures on the right are derived from the model with reproductive age as fixed effect.

It is clear this the models with a thining of 1-10 did not converge. THe models with 100-500 are a lot better.
From this, I think that it’s safe to say that a thinning of 1.000 should be sufficient for the full model.

4.2.3. Run the final model

Now we are ready to run the full model.

Again, this model includes the three ages classes, based on eye lens weight, as fixed effect

# First, we need to load the model
load(file = "models/unfitted_model_NEG_EyeLens_LAST")


# Setup the model runnings parameters
samples_list = c(2000)           # We will take 2000 posterior samples
thin_list = c(1000)              # thinning is 1000
nChains = 5                      # number of chains

# Run the models 

for(Lst in 1:length(samples_list)){        # Creates the loop to run the different models
  thin = thin_list[Lst]                    # takes the Thinning number out the list
  samples = samples_list[Lst]              # Takes the Number of samples number out the list
  print(paste0("thin = ",as.character(thin),"; samples = ",as.character(samples)))   # This will print which model is running
  nm = length(models)  # this is not really important here, because we are running one model (compared to a hurdle model)
  for (model in 1:nm) { 
    print(paste0("model = ",modelnames[model]))   # ads the model name 
    m = models[[model]]                           # this takes the model
    
    # this is the actual running part
    m = sampleMcmc(m,                                            # Model info                               
                   samples = samples,                            # Number of samples
                   thin=thin,                                    # thinning interval
                   adaptNf=rep(ceiling(0.4*samples*thin),m$nr),  # Not sure
                   transient = ceiling(0.5*samples*thin),        # burning? or the one above
                   nParallel = 5,                                # paralleling the chain over the different cores
                   nChains = nChains)                            # Number of chains
    
    models[[model]] = m  # voila!
  }
  
  # Now, it will save all the different models in the model folder
  filename = paste("models/NEGATIVE_EYELENS_FINAL_MARKDOWN_models_thin_", as.character(thin),
                   "_samples_", as.character(samples),
                   "_chains_",as.character(nChains),
                   ".Rdata",sep = "")
  save(models,modelnames,file=filename)
}

We will do the same, as usual, with the model with reproductive age as fixed effect

Let’s quickly check if the models converged.

## [1] 1.000503
## [1] 1.005297
## [1] 1.002612
## [1] 1.020796

Also here, both models converged nicely.

5. Show the parameter estimates

In this part, we’ll take a closer look at the results from the final models of the positive and negative model. We will layout the results next to each other and compare then.

Below is the code that is used to generate a pdf with the parameter estimates. The code here is written for the positive model with age in days (based on eye lense weight) as fixed effect. We have ran the same code for the other models, but this is not shown, since it’s the exact same code.

# POSITIVE MODEL EYE LENS



# Use the same numbers as before
samples = 2000         # Number of samples for the final model
thin = 1000            # Thinning of the final model
nChains = 5   
nst = length(thin_list) 

# load the final model
filename = paste("models/POSITIVE_EYELENS_FINAL_MARKDOWN_models_thin_", as.character(thin),
                   "_samples_", as.character(samples),
                   "_chains_",as.character(nChains),
                   ".Rdata",sep = "")


# Load models
load(filename)
nm = length(models)

filename = paste("panels/parameter_FINAL_estimates_MARK_POS_EYELENS.pdf")

pdf(file = filename)
for(j in 1:nm){
  #j=1
  m = models[[j]]
  
  VP = computeVariancePartitioning(m)
  vals = VP$vals
  mycols = rainbow(nrow(VP$vals))
  plotVariancePartitioning(hM=m, VP=VP,cols = mycols, args.leg=list(bg="white",cex=0.7),
                           main = paste0("Proportion of explained variance, ",modelnames[[j]]),cex.main=0.8)
  preds = computePredictedValues(m)
  MF = evaluateModelFit(hM=m, predY=preds)
  
  R2 = NULL
  if(!is.null(MF$TjurR2)){
    TjurR2 = MF$TjurR2
    vals = rbind(vals,TjurR2)
    R2=TjurR2
  }
  if(!is.null(MF$R2)){
    R2=MF$R2
    vals = rbind(vals,R2)
  }
  
  filename =  paste0("panels/parameter_estimates_VP_",modelnames[[j]],".csv")
  write.csv(vals,file=filename)
  
  if(!is.null(R2)){
    VPr = VP
    for(k in 1:m$ns){
      VPr$vals[,k] = (1-R2[k])*VPr$vals[,k]
    }
    
    VPr$vals = VPr$vals[,order(R2)]
    plotVariancePartitioning(hM=m, VP=VPr,cols = mycols, args.leg=list(bg="white",cex=0.7),ylim=c(0,1),
                             main=paste0("Proportion of raw variance, ",modelnames[[j]]),cex.main=0.8)
    filename =  paste0("panels/parameter_estimates_VP_Test_",modelnames[[j]],".csv")
    write.csv( VPr$vals,file=filename)
  }
  
}

for(j in 1:nm){
  #j = 2
  m = models[[j]]
  postBeta = getPostEstimate(m, parName="Beta")
  show.sp.names = (is.null(m$phyloTree) && m$ns<=20) 
  plotBeta(m, 
           post=postBeta, 
           supportLevel = 0.95,
           param="Sign",
           plotTree = !is.null(m$phyloTree),
           covNamesNumbers = c(TRUE,FALSE),
           spNamesNumbers=c(show.sp.names,FALSE),
           cex=c(0.6,0.6,0.8))
  mymain = paste0("BetaPlot, ",modelnames[[j]])
  if(!is.null(m$phyloTree)){
    mpost = convertToCodaObject(m)
    rhovals = unlist(poolMcmcChains(mpost$Rho))
    mymain = paste0(mymain,", E[rho] = ",round(mean(rhovals),2),", Pr[rho>0] = ",round(mean(rhovals>0),2))
  }
  title(main=mymain, line=2.5, cex.main=0.8)
  
  me = as.data.frame(t(postBeta$mean))
  me = cbind(m$spNames,me)
  colnames(me) = c("Species",m$covNames)
  po = as.data.frame(t(postBeta$support))
  po = cbind(m$spNames,po)
  colnames(po) = c("Species",m$covNames)
  ne = as.data.frame(t(postBeta$supportNeg))
  ne = cbind(m$spNames,ne)
  colnames(ne) = c("Species",m$covNames)
  vals = list("Posterior mean"=me,"Pr(x>0)"=po,"Pr(x<0)"=ne)
  filename = paste0("panels/parameter_estimates_Beta_",modelnames[j],".xlsx")
  writexl::write_xlsx(vals,path = filename)
}

for(j in 1:nm){
  if(m$nt>1){
    m = models[[j]]
    postGamma = getPostEstimate(m, parName="Gamma")
    plotGamma(m, post=postGamma, supportLevel = 0.9, param="Sign",
              covNamesNumbers = c(TRUE,FALSE),
              trNamesNumbers=c(m$nt<21,FALSE),
              cex=c(0.6,0.6,0.8))
    title(main=paste0("GammaPlot ",modelnames[[j]]), line=2.5,cex.main=0.8)
  }
}

for(j in 1:nm){
  #j=1
  m = models[[j]]
  OmegaCor = computeAssociations(m)
  supportLevel = 0.95
  for (r in 1:m$nr){
    #r = 1
    plotOrder = corrMatOrder(OmegaCor[[r]]$mean,order="AOE")
    plotOrder = 1:m$ns
    toPlot = ((OmegaCor[[r]]$support>supportLevel) + (OmegaCor[[r]]$support<(1-supportLevel))>0)*sign(OmegaCor[[r]]$mean)
    #colnames(toPlot)=rep("",m$ns)
    #rownames(toPlot)=rep("",m$ns)
    mymain = paste0("Associations, ",modelnames[[j]], ": ",names(m$ranLevels)[[r]])
    if(m$ranLevels[[r]]$sDim>0){
      mpost = convertToCodaObject(m)
      alphavals = unlist(poolMcmcChains(mpost$Alpha[[1]][,1]))
      mymain = paste0(mymain,", E[alpha1] = ",round(mean(alphavals),2),", Pr[alpha1>0] = ",round(mean(alphavals>0),2))
    }
    corrplot(toPlot[plotOrder,plotOrder], method = "color",
             col=colorRampPalette(c("blue","white","red"))(3),
             mar=c(0,0,1,0),
             main=mymain,cex.main=0.8)
    
    me = as.data.frame(OmegaCor[[r]]$mean)
    me = cbind(m$spNames,me)
    colnames(me)[1] = ""
    po = as.data.frame(OmegaCor[[r]]$support)
    po = cbind(m$spNames,po)
    colnames(po)[1] = ""
    ne = as.data.frame(1-OmegaCor[[r]]$support)
    ne = cbind(m$spNames,ne)
    colnames(ne)[1] = ""
    vals = list("Posterior mean"=me,"Pr(x>0)"=po,"Pr(x<0)"=ne)
    filename = paste0("panels/parameter_estimates_Omega_",modelnames[[j]],"_",names(m$ranLevels)[[r]],".xlsx")
    writexl::write_xlsx(vals,path = filename)
  }
}
dev.off()    

We end up with the following results from both models.

  1. Variance partitioning

We see that the age classes explain a lot of variation in some species, but not in others.

  1. Fixed effects

Here, we see the results from the 2 models with different covariates (A: model with age classes defined by eye lens weight, B = model with age defined by reproductive state). The blue points and error bars are the positve model, the red one is the negative model.

Basically, it looks like the results are quite similar.

Posterior mean responses (with their 95% credibility intervals) of the different helminth species to the fixed effects derived from the positive model where the age classes were defined based on eye lens weight.
Fixed_effects Anaplasma Bartonella Davaineidae Hymenolepis Protospirura Syphacia Trichostrongylidae Trichuris
Intercept -1.41 (-2.16;-0.74) -2.01 (-2.62;-1.48) -1.66 (-3.77;-0.63) -0.51 (-0.99;-0.12) -0.92 (-1.28;-0.58) -1.17 (-1.53;-0.82) -1.52 (-1.92;-1.15) -1.66 (-2.13;-1.26)
Sex (male) -0.17 (-0.67;0.31) -0.59 (-1.22;-0.02) -1.12 (-2.83;-0.27) -0.26 (-0.76;0.15) -0.81 (-1.26;-0.4) 0.26 (-0.12;0.63) -0.04 (-0.46;0.38) -0.03 (-0.46;0.4)
Exploration 0.04 (-0.2;0.28) -0.47 (-0.8;-0.18) -0.54 (-1.63;-0.06) 0.13 (-0.08;0.37) -0.17 (-0.38;0.04) -0.07 (-0.27;0.12) -0.01 (-0.23;0.21) -0.12 (-0.36;0.11)
Stress-sensitivity -0.03 (-0.27;0.21) -0.3 (-0.57;-0.03) -0.45 (-1.38;-0.02) 0.15 (-0.06;0.41) -0.1 (-0.3;0.09) -0.03 (-0.22;0.16) -0.02 (-0.24;0.2) -0.19 (-0.43;0.04)
Age
(50 < X < 200 days) 0.87 (0.38;1.36) 0.88 (0.3;1.52) 0.86 (0.17;1.89) 0.88 (0.43;1.47) 1.08 (0.67;1.51) 0.21 (-0.19;0.6) 0.34 (-0.09;0.78) 0.58 (0.14;1.07)
Age
(> 200 days) 0.5 (-0.3;1.25) 1.36 (0.64;2.11) 1.53 (0.49;3.14) 1.34 (0.65;2.2) 2.06 (1.41;2.8) 0.7 (0.12;1.26) 1.31 (0.75;1.88) 1.42 (0.82;2.07)
Posterior mean responses (with their 95% credibility intervals) of the different helminth species to the fixed effects derived from the negative model where the age classes were defined based on eye lens weight.
Fixed_effects Anaplasma Bartonella Davaineidae Hymenolepis Protospirura Syphacia Trichostrongylidae Trichuris
Intercept -1.75 (-2.47;-1.11) -2.29 (-2.93;-1.7) -1.79 (-3.88;-0.71) -0.52 (-0.99;-0.13) -0.91 (-1.27;-0.58) -1.18 (-1.54;-0.84) -1.53 (-1.93;-1.17) -1.64 (-2.06;-1.26)
Sex (male) 0.11 (-0.38;0.6) -0.27 (-0.91;0.31) -1.19 (-2.82;-0.3) -0.28 (-0.79;0.14) -0.8 (-1.23;-0.4) 0.27 (-0.09;0.64) -0.03 (-0.46;0.38) -0.01 (-0.45;0.42)
Exploration 0.06 (-0.2;0.31) -0.47 (-0.84;-0.14) -0.6 (-1.74;-0.07) 0.14 (-0.09;0.38) -0.17 (-0.37;0.03) -0.07 (-0.27;0.13) -0.01 (-0.23;0.21) -0.12 (-0.36;0.11)
Stress-sensitivity -0.04 (-0.29;0.2) -0.4 (-0.71;-0.11) -0.5 (-1.48;-0.02) 0.16 (-0.06;0.43) -0.1 (-0.3;0.1) -0.03 (-0.22;0.16) -0.03 (-0.25;0.19) -0.19 (-0.43;0.04)
Age
(50 < X < 200 days) 1.06 (0.54;1.58) 0.64 (0.02;1.3) 0.89 (0.14;1.91) 0.93 (0.46;1.52) 1.08 (0.68;1.5) 0.22 (-0.18;0.61) 0.34 (-0.09;0.77) 0.57 (0.12;1.04)
Age
(> 200 days) 0.59 (-0.23;1.39) 1.31 (0.56;2.09) 1.66 (0.58;3.21) 1.38 (0.69;2.21) 2.07 (1.44;2.74) 0.72 (0.15;1.26) 1.31 (0.76;1.86) 1.4 (0.83;2.01)
Posterior mean responses (with their 95% credibility intervals) of the different helminth species to the fixed effects derived from the positive model where the age classes were defined based on reproductive state.
Fixed_effects Anaplasma Bartonella Davaineidae Hymenolepis Protospirura Syphacia Trichostrongylidae Trichuris
Intercept -0.89 (-1.52;-0.34) -1.33 (-1.84;-0.95) -0.34 (-0.66;-0.03) 0.16 (-0.14;0.44) -0.01 (-0.33;0.3) -0.92 (-1.22;-0.64) -1.1 (-1.46;-0.79) -1.58 (-2.89;-0.96)
Sex (male) -0.17 (-0.64;0.3) -0.54 (-1.18;0.02) -0.66 (-1.09;-0.25) -0.24 (-0.61;0.12) -0.99 (-1.78;-0.46) 0.22 (-0.15;0.6) -0.08 (-0.52;0.36) -0.14 (-0.86;0.44)
Exploration -0.07 (-0.31;0.16) -0.57 (-0.94;-0.25) -0.3 (-0.52;-0.1) 0.02 (-0.16;0.2) -0.34 (-0.68;-0.09) -0.11 (-0.31;0.08) -0.09 (-0.32;0.12) -0.32 (-0.78;0.01)
Stress-sensitivity -0.04 (-0.26;0.18) -0.3 (-0.59;-0.03) -0.21 (-0.42;-0.01) 0.1 (-0.08;0.28) -0.13 (-0.41;0.11) -0.05 (-0.24;0.14) -0.05 (-0.27;0.18) -0.31 (-0.79;0.02)
Age
Juvenile -0.75 (-1.38;-0.16) -1.02 (-1.97;-0.22) -0.96 (-1.5;-0.43) -0.9 (-1.37;-0.45) -1.67 (-2.83;-0.94) -0.31 (-0.78;0.15) -0.58 (-1.16;-0.04) -0.91 (-1.99;-0.15)
Posterior mean responses (with their 95% credibility intervals) of the different helminth species to the fixed effects derived from the negative model where the age classes were defined based on reproductive state.
Fixed_effects Anaplasma Bartonella Davaineidae Hymenolepis Protospirura Syphacia Trichostrongylidae Trichuris
Intercept -1.11 (-1.68;-0.6) -1.64 (-2.1;-1.24) -0.35 (-0.7;-0.03) 0.17 (-0.12;0.46) 0 (-0.29;0.29) -0.94 (-1.26;-0.65) -1.14 (-1.54;-0.81) -1.69 (-3.31;-0.96)
Sex (male) 0.07 (-0.41;0.54) -0.21 (-0.8;0.35) -0.66 (-1.1;-0.24) -0.25 (-0.63;0.11) -0.92 (-1.57;-0.42) 0.22 (-0.16;0.61) -0.07 (-0.54;0.37) -0.15 (-0.97;0.48)
Exploration -0.06 (-0.3;0.18) -0.53 (-0.89;-0.2) -0.31 (-0.53;-0.1) 0.02 (-0.17;0.2) -0.32 (-0.6;-0.08) -0.11 (-0.31;0.08) -0.1 (-0.32;0.13) -0.34 (-0.87;0.02)
Stress-sensitivity -0.05 (-0.28;0.18) -0.39 (-0.68;-0.1) -0.22 (-0.44;-0.01) 0.1 (-0.07;0.3) -0.12 (-0.37;0.1) -0.05 (-0.24;0.14) -0.05 (-0.28;0.18) -0.34 (-0.88;0.01)
Age
Juvenile -0.66 (-1.32;-0.06) -0.78 (-1.68;-0.01) -0.95 (-1.52;-0.43) -0.9 (-1.4;-0.45) -1.59 (-2.57;-0.91) -0.31 (-0.79;0.15) -0.58 (-1.18;-0.05) -0.94 (-2.1;-0.14)

Let’s look at the predictions.

We’ll first look at effect of age (based on the different criteria) on the infection probability of Anaplasma and Bartonella (the main focus points of this studt)

  1. co-infections

On the site level

On the within-individual level

6. Compute model fit

The models’ explanatory and predictive power were analyzed using the species-specific AUC and Tjur’s R² (Tjur, 2009) values (Pearce and Ferrier, 2000). Explanatory power was computed by making predictions based on models fitted to all data. Predictive ability was calculated by performing 5-fold cross-validation. The sampling units were assigned randomly to five-folds, and predictions for each fold were based on models fitted to data on the remaining four-folds.

Here is the code for the positive model with eye lens weight to define the different age classes

# Load the data again (Adjust this)

samples = 2000
thin = 1000
nChains = 5 # number of chains


filename = paste("POSITIVE_EYELENS_FINAL_MARKDOWN_models_thin_", as.character(thin),
                 "_samples_", as.character(samples),
                 "_chains_",as.character(nChains),".Rdata",sep = "")
load(filename)


m = models[[1]]
# We next evaluate model fit in terms of explanatory power and prediction power
preds = computePredictedValues(m)
MF = evaluateModelFit(hM=m, predY=preds)

# We also evaluate model fit in terms of predictive power based on two-fold cross-validation (later 5 fold). Doing the latter takes a lot of time, as the model needs to be re-fitted twice. For this reason, we store the results from cross-validation.

# Set run.cross.validation = TRUE to perform the cross-validation and to save the results.

# Set run.cross.validation = FALSE to read in results from cross-validation that you have run previously.
library(statmod)

nm = length(models)

MF = list()
MFCV = list()
WAIC = list()

for(model in 1:nm){
  print(paste0("model = ",as.character(model)))
  m = models[[model]]
  preds = computePredictedValues(m)
  MF[[model]] = evaluateModelFit(hM=m, predY=preds)
  partition = createPartition(m, nfolds = 5)
  preds = computePredictedValues(m,partition=partition,nParallel = 5) #nParallel = nChains
  MFCV[[model]] = evaluateModelFit(hM=m, predY=preds)
  WAIC[[model]] = computeWAIC(m)
}

filename_out = paste("MODELFIT_POSITIVE_EYELENS_FINAL_MARKDOWN_models_thin_", as.character(thin),
                     "_samples_", as.character(samples),
                     "_chains_",as.character(nChains),
                     ".Rdata",sep = "")
save(MF,MFCV,WAIC,modelnames,file = filename_out)

Now, we do it for all the other models as well, but I won’t include the code to save space.

After we done all of this (which takes a lot of time, around 6 days per model), we can make the plots.